Automatic Refinement of Syntactic Categories in Chinese Word Structures

نویسنده

  • Jianqiang Ma
چکیده

Annotated word structures are useful for various Chinese NLP tasks, such as word segmentation, POS tagging and syntactic parsing. Chinese word structures are often represented by binary trees, the nodes of which are labeled with syntactic categories, due to the syntactic nature of Chinese word formation. It is desirable to refine the annotation by labeling nodes of word structure trees with more proper syntactic categories so that the combinatorial properties in the word formation process are better captured. This can lead to improved performances on the tasks that exploit word structure annotations. We propose syntactically inspired algorithms to automatically induce syntactic categories of word structure trees using POS tagged corpus and branching in existing Chinese word structure trees. We evaluate the quality of our annotation by comparing the performances of models based on our annotation and another publicly available annotation, respectively. The results on two variations of Chinese word segmentation task show that using our annotation can lead to significant performance improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keynote Speech: Lexical Semantics of Chinese Language

In this talk, we are going to give a systematic view of lexical semantics of Chinese language. From macro perspective point of view, lexical conceptual meanings are classified into hierarchical semantic types and each type plays some particular semantic functions of Host, Attribute, and Value to form a semantic compositional system. Lexical senses and their compositional functions will be exemp...

متن کامل

Learning Grammar with Explicit Annotations for Subordinating Conjunctions

Data-driven approach for parsing may suffer from data sparsity when entirely unsupervised. External knowledge has been shown to be an effective way to alleviate this problem. Subordinating conjunctions impose important constraints on Chinese syntactic structures. This paper proposes a method to develop a grammar with hierarchical category knowledge of subordinating conjunctions as explicit anno...

متن کامل

Design of Chinese Morphological Analyzer

This is a pilot study which aims at the design of a Chinese morphological analyzer which is in state to predict the syntactic and semantic properties of nominal, verbal and adjectival compounds. Morphological structures of compound words contain the essential information of knowing their syntactic and semantic characteristics. In particular, morphological analysis is a primary step for predicti...

متن کامل

Syntactic Function-Based Chinese Lexical Categories and Category Grammar Parsing

By merging syntactic categories of word classes, lexical categories were obtained. By demonstrating combination and type raising rules respectively from curried and uncurried perspectives, a category combination algorithm was presented, in which application, composition and type raising rules were sequentially examined, and the first available rule was selected. A Chinese CCG parser was develop...

متن کامل

Sixth International Joint Conference on Natural Language Processing Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing

In this talk, we are going to give a systematic view of lexical semantics of Chinese language. From macro perspective point of view, lexical conceptual meanings are classified into hierarchical semantic types and each type plays some particular semantic functions of Host, Attribute, and Value to form a semantic compositional system. Lexical senses and their compositional functions will be exemp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014